02:53
2026-06-16
goodfire.ai
machine-learning
Predictive Data Debugging: Reveal and Shape What Models Learn Before You Train
Goodfire researchers introduced predictive data debugging, a method that uses interpretability to predict which behaviors reinforcement learning will amplify or suppress from a preference dataset befo…